Inductive reasoning is a core component of human intelligence. In the past research of inductive reasoning within computer science, logic language is used as representations of knowledge (facts and rules, more specifically). However, logic language can cause systematic problems for inductive reasoning such as disability of handling raw input such as natural language, sensitiveness to mislabeled data, and incapacity to handle ambiguous input. To this end, we propose a new task, which is to induce natural language rules from natural language facts, and create a dataset termed DEER containing 1.2k rule-fact pairs for the task, where rules and facts are written in natural language. New automatic metrics are also proposed and analysed for the evaluation of this task. With DEER, we investigate a modern approach for inductive reasoning where we use natural language as representation for knowledge instead of logic language and use pretrained language models as ''reasoners''. Moreover, we provide the first and comprehensive analysis of how well pretrained language models can induce natural language rules from natural language facts. We also propose a new framework drawing insights from philosophy literature for this task, which we show in the experiment section that surpasses baselines in both automatic and human evaluations.
translated by 谷歌翻译
By adopting popular pixel-wise loss, existing methods for defocus deblurring heavily rely on well aligned training image pairs. Although training pairs of ground-truth and blurry images are carefully collected, e.g., DPDD dataset, misalignment is inevitable between training pairs, making existing methods possibly suffer from deformation artifacts. In this paper, we propose a joint deblurring and reblurring learning (JDRL) framework for single image defocus deblurring with misaligned training pairs. Generally, JDRL consists of a deblurring module and a spatially invariant reblurring module, by which deblurred result can be adaptively supervised by ground-truth image to recover sharp textures while maintaining spatial consistency with the blurry image. First, in the deblurring module, a bi-directional optical flow-based deformation is introduced to tolerate spatial misalignment between deblurred and ground-truth images. Second, in the reblurring module, deblurred result is reblurred to be spatially aligned with blurry image, by predicting a set of isotropic blur kernels and weighting maps. Moreover, we establish a new single image defocus deblurring (SDD) dataset, further validating our JDRL and also benefiting future research. Our JDRL can be applied to boost defocus deblurring networks in terms of both quantitative metrics and visual quality on DPDD, RealDOF and our SDD datasets.
translated by 谷歌翻译
从新闻文章中提取事件的信息论点是信息提取的一个具有挑战性的问题,这需要对每个文档的全球上下文理解。尽管有关文档级提取的最新工作已经超越了单句子,并提高了端到端模型的跨句子推理能力,但它们仍然受到某些输入序列长度约束的限制,通常忽略事件之间的全局上下文。为了解决此问题,我们通过构建文档存储器存储来记录上下文事件信息,并利用它隐含,明确地帮助解码以后事件的参数,从而引入了一个新的基于全局神经生成的框架,以用于文档级事件参数提取提取文档级别的事件参数提取。经验结果表明,我们的框架的表现要优于先验方法,并且使用约束的解码设计对对抗注释的示例更为强大。 (我们的代码和资源可在https://github.com/xinyadu/memory_docie上获得研究目的。)
translated by 谷歌翻译
在这项工作中,我们提出了一种基于物理信息引导元进化策略(ES)的新型数据驱动的实时电力系统电压控制方法。主要目标是快速提供自适应控制策略来减轻故障引起的延迟电压恢复(FIDVR)问题。已经为相同或类似的具有挑战性的控制问题制定了强化学习方法,但它们遭受培训效率低下,“角落或看不见”情景缺乏鲁棒性。另一方面,在电力系统中开发了广泛的物理知识,但基于学习的方法很少有利于。为了解决这些挑战,我们介绍了可训练的动作掩模技术,以灵活地将物理知识嵌入到RL模型中,以排除不必要或不利的行动,并达到样本效率,控制性能和鲁棒性的显着改善。此外,我们的方法利用过去学习体验来导出代理梯度,以指导和加速培训勘探过程。与其他最先进的基准方法的IEEE 300座系统和比较案例研究表明了我们方法的有效性和优势。
translated by 谷歌翻译
Humans can classify an unseen category by reasoning on its language explanations. This ability is owing to the compositional nature of language: we can combine previously seen concepts to describe the new category. For example, we might describe mavens as "a kind of large birds with black feathers", so that others can use their knowledge of concepts "large birds" and "black feathers" to recognize a maven. Inspired by this observation, in this work we tackle zero-shot classification task by logically parsing and reasoning on natural language explanations. To this end, we propose the framework CLORE (Classification by LOgical Reasoning on Explanations). While previous methods usually regard textual information as implicit features, CLORE parses the explanations into logical structure the and then reasons along this structure on the input to produce a classification score. Experimental results on explanation-based zero-shot classification benchmarks demonstrate that CLORE is superior to baselines, mainly because it performs better on tasks requiring more logical reasoning. Alongside classification decisions, CLORE can provide the logical parsing and reasoning process as a form of rationale. Through empirical analysis we demonstrate that CLORE is also less affected by linguistic biases than baselines.
translated by 谷歌翻译
文档级信息提取(IE)任务最近开始使用端到端的神经网络技术对其句子级别的IE同行进行认真重新审视。但是,对方法的评估在许多维度上受到限制。特别是,Precision/Recell/F1分数通常报道,几乎没有关于模型造成的错误范围的见解。我们基于Kummerfeld和Klein(2013)的工作,为基于转换的框架提出了用于文档级事件和(N- ARY)关系提取的自动化错误分析的框架。我们采用我们的框架来比较来自三个域的数据集上的两种最先进的文档级模板填充方法;然后,为了衡量IE自30年前成立以来的进展,与MUC-4(1992)评估的四个系统相比。
translated by 谷歌翻译
尽管已经对音频驱动的说话的面部生成取得了重大进展,但现有方法要么忽略面部情绪,要么不能应用于任意主题。在本文中,我们提出了情感感知的运动模型(EAMM),以通过涉及情感源视频来产生一次性的情感谈话面孔。具体而言,我们首先提出了一个Audio2Facial-Dynamics模块,该模块从音频驱动的无监督零和一阶密钥点运动中进行说话。然后,通过探索运动模型的属性,我们进一步提出了一个隐性的情绪位移学习者,以表示与情绪相关的面部动力学作为对先前获得的运动表示形式的线性添加位移。全面的实验表明,通过纳入两个模块的结果,我们的方法可以在具有现实情感模式的任意主题上产生令人满意的说话面部结果。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method and analyze drum grooves produced by GPT3 compared to those played by human professionals, exposing the strengths and weaknesses of such generation by language-to-music transfer. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.
translated by 谷歌翻译